Show language: C# VB.NET Both
Stemming (finding lemmas) is used to search for word variations. Eg. if the user searches for "tables", they will receive results that include "table" (note that this is more powerful than simple wildcard searching, because it can match shorter as well as longer forms).
The LemmaWeightFactor property in the Configuration is set to 0.5 by default, and this controls by how much lemma matches are weighted in the results. If the user searched for "tables", then occurrences of "table" will weigh half as much as occurrences of "tables", this ranking helps prevent the top results being cluttered with inexact matches to the searched terms.
You can choose the language to use for stemming in the LemmaLanguage property in the Configuration. Note that stemming algorithms are language specific and so the supported languages may be less numerous than the spelling dictionaries that are also available for the SearchSuggestions control.
English and German lemmas are built-in to the product DLLs, other languages can be downloaded from this page. Unzipped files should be placed in the index directory, and the LemmaLanguage property in Configuration should be set accordingly.
The Central Event System includes an Action event called "GetWordVariations" which is fired when the search engine needs lemmas. By hooking into this event the lemmas provided by the default generator can be completely overwritten or modified. Note that variations don't necessarily have to be lemmas (shared stems) but can be any words that should be searched for at the same time, such as synonyms and substitutes. For example a company with a furniture polish product named "Shine-O", may like to include "Shine-O" as a variation for "polish".
Within the plug-in project, handle the Action event, and specifically the GetWordVariations Action.
void CentralEventDispatcher_Action(object sender, Keyoti.SearchEngine.Events.ActionEventArgs e)
{
if (e.ActionData.Name == Keyoti.SearchEngine.Events.ActionName.GetWordVariations)
{
Keyoti.SearchEngine.DataAccess.Word theWord = ((object[])e.ActionData.Data)[0] as Keyoti.SearchEngine.DataAccess.Word;
ArrayList variations = (e.ActionData.Data as object[])[1] as ArrayList;
if (theWord.WordContent == "polish")
{
string firstVariation = variations[0].ToString();
variations.Insert(0, "Shine-O");//insert a word we also want to search for
}
}
}
Private Sub CentralEventDispatcher_Action(ByVal sender As Object, ByVal e As Keyoti.SearchEngine.Events.ActionEventArgs)
If (e.ActionData.Name = Keyoti.SearchEngine.Events.ActionName.GetWordVariations) Then
Dim theWord As Keyoti.SearchEngine.DataAccess.Word = CType(CType(e.ActionData.Data, Object())(0), Keyoti.SearchEngine.DataAccess.Word)
Dim variations As ArrayList = CType(CType(e.ActionData.Data, Object())(1), ArrayList)
If (theWord.WordContent = "polish") Then
Dim firstVariation As String = variations(0).ToString
variations.Insert(0, "Shine-O")
'insert a word we also want to search for
End If
End If
End Sub
The above code works with simple single word substitutions, however it does not support phrases, eg. if the user searches "lawn mower" and term "lawnmower" should also be a match.
Within the plug-in project, handle the Action event, and specifically the QueryExpressionGroupCreated Action.
void dispatcher_Action(object sender, ActionEventArgs e)
{
//Log everything - comment this line after debugging to optimize speed.
Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("Plug-in Template Project", e.ActionData.Name.ToString(), conf);
try
{
Dictionary map = new Dictionary();
map[new[] { "lawn", "mower" }] = new[] { "lawnmower", "lawn-mower" };
map[new[] { "grass", "trimmer" }] = new[] { "lawnmower", "lawn-mower"};
if (e.ActionData.Name == ActionName.QueryExpressionGroupCreated)
{
var el = (e.ActionData.Data as Search.GroupElement);
//look for matches in the group
SearchForPhraseMatches(map, el, conf);
}
}
catch (Exception ex)
{
Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("Plug-in Template Project", "Exception: "+ex.ToString(), conf);
}
}
static void SearchForPhraseMatches(Dictionary map, GroupElement group, Configuration configuration)
{
var childElements = group.ChildElements;
for (int j = 0; j < childElements.Count; j++)
{
if (childElements[j] is WordElement)
{
foreach (string[] key in map.Keys)
{
int p = 0;
while (p < key.Length && j + p < childElements.Count && key[p] == (childElements[j + p] as WordElement).Content) { p++; }
if (p == key.Length)
{
//found a match for the phrase in the map
//replace contents of 'group' with 2 groups, the existing content of 'group' and a new group with the synonyms
GroupElement existingGroup = new GroupElement(configuration);
existingGroup.ChildElements.AddRange(group.ChildElements);
existingGroup.GroupOperator = group.GroupOperator;
group.ChildElements.Clear();
group.GroupOperator = LogicOperator.Or;
group.ChildElements.Add(existingGroup);
foreach (string synonym in map[key])
{
group.ChildElements.Add(new GroupElement(synonym, configuration));
}
}
}
} else if (childElements[j] is GroupElement)
{
SearchForPhraseMatches(map, childElements[j] as GroupElement, configuration);
}
}
}
The 'map' variable holds a mapping of search phrases and their alternatives. In the above code, if the user query includes "lawn mower" or "grass trimmer" then matches will also be made for "lawnmower" and "lawn-mower". The mapped to terms can also be multiple word phrases, eg.
map[new[] { "drillpress" }] = new[] { "drill press" };
will match a document containing "drill press" if the user searches for "drillpress".